Simple Random Sampling
versus
Stratified Sampling

2015 MLB player salaries

Let's investigate the average salary of four teams:

  • Los Angeles Dodgers
  • New York Yankees
  • Tampa Bay Devil Rays
  • Arizona Diamondbacks
url <- "http://www.usatoday.com/sports/mlb/salaries/"
salaries <- read_csv("mlb_salaries.csv") %>%
  select(NAME, TEAM, POS, SALARY)
salaries_trim <- salaries %>% 
  filter(TEAM %in% c("LAD", "NYY", "TB", "ARI"))
salaries_summary <- salaries_trim %>% group_by(TEAM) %>%
  summarize(mean_sal = mean(SALARY), count_team = n())
grand_mean <- salaries %>% summarize(mean(SALARY))

  • 120 player salaries available for those four teams
  • The mean salary for ALL 120 players on those four teams is \[\mu = \$4,214,614\]

  • Remember, this is a summary value for the population. We hardly ever have the whole population to work with.

How do SRS and Stratified Sampling compare in estimating \(\mu\)?

n <- 40
set.seed(20160129)
mean_srs <- salaries %>% 
  sample_n(n) %>%
  summarize(mean_srs_salary = mean(SALARY))
  • Suppose we select 40 players at random from the 120 total

  • \(\bar{x}_{SRS} = \$4,248,380\)

Now for Stratified

strat_n <- 10
mean_strat_by_team <- salaries %>% group_by(TEAM) %>%
  sample_n(strat_n) %>%
  summarize(mean_by_team = mean(SALARY))
mean_strat <- mean_strat_by_team %>%
  summarize(mean(mean_by_team))
  • Let's select 10 players from each of the 4 teams

  • \(\bar{x}_{STRAT} = \$3,998,016\)

  • SRS: Absolute bias of $33,766.27

  • Stratified: Absolute bias of $216,597.9

So is this evidence that simple random sampling is clearly better?

Not so fast!

corrs1

corrs2

If you learn one thing in this class…

corr

Plicker time!

Practice

Practice

  1. Find your numerical pair
  2. Introduce yourself (name, year, major, hometown)
  3. Discuss the problems on the handout and record your thoughts.

Principles of Experimental Design

Control: Compare treatment of interest to a control group.

Randomization: Randomly assign subjects to treatments.

Replication: Within a study, replicate by collecting a sufficiently large sample. Or replicate the entire study.

Blocking: If there are variables that are known or suspected to affect the response variable, first group subjects into blocks based on these variables, and then randomize cases within each block to treatment groups.

Replication

psych

Other key ideas

Placebo: fake treatment, often used as the control group for medical studies

Placebo effect: experimental units showing improvement simply because they believe they are receiving a special treatment

Blinding: when experimental units do not know whether they are in the control or treatment group

Double-blind: when both the experimental units and the researchers do not know who is in the control and who is in the treatment group